Search in Complex Networks: a New Method of Naming 
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We suggest a method for routing when the source does not posses full information about the 
shortest path to the destination. The method is particularly useful for scale-free networks, and 
exploits its unique characteristics. By assigning new (short) names to nodes (aka labelling) we are 
able to reduce significantly the memory requirement at the routers, yet we succeed in routing with 
high probability through paths very close in distance to the shortest ones. 
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In recent years it has been shown that many real world 
networks, such as technological, social and biological net- 
works, and in particular, the Internet, are scale free, i.e. 
have a power law degree distribution 0, 0, 0, 3 The 
probability of a site to have degree k, P(k) ~ fc~ 7 , where 
in the Internet, it is assumed that 7 w 2.1 — 2.5. One of 
the most important tasks in networking is routing. Ef- 
ficient routing is necessary in order to provide efficient 
transportation and utilization of the network resources. 
In the context of communication networks, routing is an 
important task in packet switched networks as well as in 
overlay networks (such as Peer-to-Peer) . In this paper 
we present a method for searching for nodes and routing 
where no knowledge of the location of the destination 
node is given. Such methods are usually known as "com- 
pact routing" schemes. 

In order to obtain good results several variables should 
be considered: 

• The stretch is defined as the ratio of the actual 
routing path to the shortest path between two given 
nodes. The smallest the ratio the more efficient the 
communication in the network. 

• The table size is the number of entries kept in the 
storage of each node. The smaller the table the 
more efficient the scheme in terms of memory re- 
quirements. 

• The label size is the number of bits presenting the 
name (or address) of each node. The smallest pos- 
sible label size needed to distinguish between sites 
with a unique id is logarithmic. Most efficient rout- 
ing schemes use larger labels in order to present 
more information about the node. 

In many cases it is desirable to design an approximate 
routing schemes that require considerably smaller tables, 
in the cost of allowing for higher stretch (shortest path 
routing not guaranteed), and larger labels. 

Partial knowledge search in a small- world lattice based 
network and power-law networks was investigated in 
0- • The first work on generalized routing with a 
tradeoff of table size vs. label size and stretch was given 
by Peleg and Upfal [8j. This scheme has later been ex- 
tended by Thorup and Zwick and by Cowen [lj} ■ All 
those schemes require a rather large table (of order N 1 ^ 2 



to ensure an upper bound of 3 for the stretch, or, in gen- 
eral, 0(N~) for an odd stretch s). A numerical study 
of the actual stretch for scale free networks is presented 
in ^3 , showing that the actual performance of the above 
routing schemes, in terms of the average stretch, is much 
better than the worst case guarantee. 

In this paper we discuss a class of routing schemes 
with a parameter H (1 < H < N), which is propor- 
tional to the memory requirement at the nodes. We give 
arguments showing that the ratio of the average routing 
distance to the average shortest path is below 2 with high 
probability, mo matter what H is. For scale- free networks 
the stretch is usually much lower, and we show analyti- 
cally and numerically that even for very small values of 
H, H = 0(\og u N) for v > 0, the actual stretch is very 
close to 1. Thus, a routing scheme that requires substan- 
tially small tables and poly-logarithmic labels (see below) 
may lead to a very efficient routing. When comparing 
properly, our scheme is more efficient than previous ones; 
moreover, our scheme is simpler and more intuitive (e.g. 
do not involve randomization), and the trade-off between 
performance and memory requirements is controllable. 

The random network model we use here is the Config- 
uration Model of [l2|. The networks in this model are 
created by the following process: given a network with 
N nodes, and a degree sequence ki,\<i<N, create a list 
containing fcj copies of each node i, and choose a random 
matching on this list to create the edges of the network. 
We ignore self loops and multiple edges, which are sta- 
tistically insignificant fl3| . 

The main degree sequence we will discuss is of scale 
free networks: P(k) ~ k" 1 , (with k > k min ). This de- 
gree sequence has been shown to exist naturally in many 
networks 0, in particular, the Internet Q and P2P net- 
works [l4| as discussed above. Another degree sequence 
which we will use for comparison is the one of the Erdos- 

Renyi (ER) random network model, P(k) = e . 

The proposed routing scheme consists of two stages: 
the preprocessing and the actual routing. 

Preprocessing The H highest degree nodes are desig- 
nated as the "hubs" . (Ties in the degree are bro- 
ken arbitrarily). For each site i the closest hub hi 
is searched (ties are broken by degree). Designate 
the shortest path from site i to its hub hi by - 
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hi. The label for site i will be Li = 
(i,Vi t i, v i;2 , ■ ■ ■ ,Uj, ni _i, ft*). The routing table for 
each node in the network contains the link leading 
to the shortest path for each of the hubs, as well as 
a list of all of its immediate neighbors. 

Actual Routing Assume a packet is sent from some 
initial node towards the destination node t. As the 
packet reaches some intermediate node x, it is han- 
dled by the following algorithm: 

1. If x = t then stop. 

2. If t is a neighbor of x, then send the packet 
directly to t. 

3. Otherwise, if x E L t , i.e. x = Vtj for some j, 
then move the packet to Vtj—i- 

4. Otherwise, search for h t in the table and send 
the packet through the appropriate link. 

Let us first show that our method is efficient by means 
of average running time. 

Preprocessing: Choosing the H hubs and sorting them 
can be done in 0(N + HlogH) time [l^. Next, from 
each hub we need only to start a Breadth First Search, 
keeping for each node x that is reached its distance to 
the root and its predecessor (storing those in x's routing 
table) . Next for each node we decide which is the closest 
hub, find the path to that hub, and store it as its new 
label. All of this can be done in 0(MH) time, where M 
is the total number of edges (which is of the order of N 
in practical cases) . Note that this running-time is better 
than in previously suggested schemes 

Routing decision: In each decision we need to search 
either the label or the routing table. In practical cases 
the label size is extremely small and can be considered 
constant; the routing table can be implemented as a hash 
table to provide average constant access time . There- 
fore we conclude that an average routing decision can be 
done in constant time. 

We now look at the average distance travelled by a 
packet relative to the average shortest path in the net- 
work. The average is taken over all pairs and all config- 
urations of the network in the network model presented 
above. 

We use the following lemma. Let a\ and a 2 be nodes 
with respective degrees k ai > k a2 , and b be any other 
random node. Denote by d(a, b) the length of the shortest 
path between nodes a and b, then we claim that 

P(d(ai,b) < > P(d(a 2 , b) < I) (1) 

for all I. 

To see that, we consider only cases in which the paths 
a± — > b and a 2 — > b exist (otherwise the distance is not 
defined). Now fix the connections in the sub-network 
formed by deleting a\ and a 2 from the original net- 
work, and consider the links between this sub-network 
and {ai, 02} ■ Assume that p of the links lead to paths of 
length I, which is the length of the shortest path to b. 



If the network is with high probability fully connected 
(as in random networks in which all degrees are at least 
3 [l6|], and the case of the Internet), then the ratio of 
matchings for which d(ai, b) — I and d(a 2 , b) > I to those 
where d{a u b) > I and d(a 2 ,b) = I is ( k ° 1 )/( k p 2 ), and 
therefore the distance is a non-increasing function of the 
degree. 

In cases where the network is not fully connected, we 
must condition the relevant matchings on the demand 
that both ai and a 2 are connected to b. It can be shown 
that also in these cases Eq. Q is valid. Therefore we 
conclude that, (d(a,b)), for some random node b, is a 
non-increasing function of k a - 

Voi,a2,6 - k ai < k a2 => (d(a 2 ,b)) < (<f(oi,6)) (2) 

Next we use the notation d(a, b) for the length of the 
shortest path between nodes a and b, and r(a, b) for the 
distance travelled by a packet sent from a to b using the 
above algorithm (notice that r(a, b) need not be sym- 
metric, as opposed to d{a,b) ). We argue, that in the 
proposed routing scheme, the expected average stretch 



Denote the source node as s, the destination as t, the 
hub of t as h t , and the lengths of the direct paths between 
them d(s, t), d(s, ht), d(t, ht). By the construction of the 
scheme: 

= (r(s,t)) (d{s,ht) + d(h t ,t)) 
(d(s,t)) ~ (d(s,t)) 
(d(s,h t )) (d(h t ,t)) 
(d(s,t)) + (d(s,t)) • 1 1 

Consider first the case that the hub h t is just a random 
node, call it r. Becasue of symmetry, there no reason 
why any of the distances d(s,t) , d(s,r) , d(r,t) would 
be larger than the other, therefore on average the total 
routing distance d(s, r) + d(r, t) is just twice the shortest 
distance d{s,t), or the average stretch is 2. 

This is true for any random node being a hub, but 
we are choosing the hubs as nodes with high degree. 
Since eq. 10) states that the average distance between 
a random node and a hub is smaller than the distance 
between two random nodes, we expect the average dis- 
tances to and from the hub to be small, i.e. we expect 
d{s,ht) < d{s,t) and d(h t ,t) < d(s,t), thus we expect 
that the average stertch S < 2. 

(The cases in which k s ,kt > kh t are treated easily - 
Since ht is the hub of t, then even if s is a hub then by 
the definition of the scheme h t is closer to t than s, and 
d(ht,t) < d(s, t); if t is a hub the routing is shortest path 
by construction. Thus we can assume that s and t are 
not hubs and k s ,k t < kh t ) ■ 

Note that direct application of eq. © is not possi- 
ble since in the derivation we assumed the three nodes 
{ai,a 2 ,b} are fixed, while in our case rewiring might 
cause ht not to be the hub closest to t anymore. Never- 
theless, there is no reason to assume the inequalities will 
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be invalid for the reduced configuration space where we 
force h t to be the hub closest to t. Paradoxically, if there 
is only one hub, then the three nodes are fixed and we 
can apply eq. J2J directly, to prove S <= 2. It is however 
obvious, and confirmed by simulations, that increasing H 
would decrease the stretch. 

Other properties of the proposed scheme are: 

1. The label size (in bits) for the proposed scheme is 
at most (D + 1) log N, where D is the diameter of 
the network. 

2. The table size at every node contains H + k entries, 
where k is the degree. 

3. The contents of the packet need not to be changed 
through the routing process. 

4. The scheme is a shortest-path routing for a tree. 

To explain 1, recall that the label contains the shortest 
path to the closest hub. The distance is at most D (and 
add one for the site itself), and each node requires at 
most log TV bits to identify. Thus, property 1 follows. 
The second and third parts follow from the definition 
of the scheme. The fourth follows since in a tree there 
is only one path between any two nodes, so either the 
hub is on the path, or the destination is on the path to 
the hub, or there exists some node in the path to the 
hub which is also on the path to the destination. (In a 
different way, if there was a shortest path different from 
the path source — > hub — > destination, then a loop would 
be constructed, contradicting the network being a tree). 

For scale-free networks we can show some better 
bounds on the label size and the stretch. It has been 
shown 0, 0| that with high probability the average 
distance between nodes is O(loglogiV) and the diameter 
is O(logiV) (for k min > 2 the diameter is also expected 
to be 0(loglog iV)). Therefore, it can be concluded that 
the maximum label size is of order 0(log 2 N) and the av- 
erage label size is 0(logiVloglog iV). For scale free net- 
works with 7 < 3, tighter bound for the stretch can be 
obtained. The radius of the core (the location of all high 
degree nodes) is of order log log N, and almost all the 
mass is concentrated outside the core (see, e.g., [l7lll9| p. 
Now, looking at a ball around a random site with a ra- 
dius a little smaller than the radius of the network, it is 
expected that the ball will not include the largest hub 
(since most sites are outside the core). Since the size of 
the largest hub is of order C^A 1 /^- 1 )) > A 1 / 2 for 
7 < 3, it is expected that the ball has less than A 1 / 2 
outgoing links (since any 2 balls with more than TV 1 / 2 
are connected with high probability). Any 2 such balls 
are not expected to be connected between them, since 
the product of their "degree" (number of outgoing links) 
is less than A, so the distance between any two random 
sites is expected to be almost twice the radius (for a rig- 
orous proof of this see ^3)- Thus the path through the 
hubs is almost optimal with high probability, and the 
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FIG. 1: Label size distribution for scale-free networks. Typi- 
cal label is extremely short, what makes our scheme efficient 
also in terms of bandwidth utilization. 



stretch between 2 randomly selected sites is expected to 
approach I for large A. 

One other nice property of the proposed scheme is that 
the labelling and table construction can be achieved us- 
ing a distributed rather than a centralized algorithm, and 
in an efficient manner (the number of messages transfers 
needed is almost linear in the size of network times H. 
Details are to be published elsewhere). Cases of a node 
or link failures can be bypassed in a standard way, with- 
out affecting the other nodes of the network. Having all 
the above properties in mind, our scheme can be consid- 
ered seriously for applications in real-world systems, in 
which not always there is a central management of the 
network that has the knowledge of the topology of the 
entire network. 

To demonstrate the efficiency of the scheme, we present 
computer simulation results. For all networks, we use the 
parameters A — 10000, 7 = 2.3, and average over many 
realizations. (The stretch of a network is calculated as an 
average over the stretch of all pairs, as in If]). To begin 
with, we verify that the labels are indeed small (Fig. QJ. 

Next we have tested the scheme with the most recent 
representation of the Internet at the AS level [2l|; the 
average stretch factor turned out to be as low as 1.067, 
with 79% of paths shortest (As opposed to 1.09 and 71% 
111 [13). In Fig. H we show the cumulative distribution 
of stretch values for routing between all pairs in a ran- 
dom realization of the configuration model (with power- 
law degree distribution), for different system sizes, ft 
can be seen that not only that most of the routes are 
along the shortest path, but the number of exceptionally 
high stretches becomes more and more rare as the system 
grows. 

Fig. 01 shows the average stretch value as a func- 
tion of the network size, compared for a few values 
of v (in H ~ log" A) in power-law networks, and for 
H ~ log 3 A for ER networks. It can be seen that the 
average stretch in the scale-free networks is significantly 
better than in the ER case and is virtually independent 
of the network size. One can also see that the stretch de- 
pends only weakly on the number of hubs; therefore, to 
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FIG. 2: Stretch distribution for a scale- free network 
(H ~ log(iV)). The (inverse) cumulative probability distri- 
bution is shown, i.e. for a given stretch value, we see the 
probability to have a larger stretch. In the case of N = 10000, 
75% of the paths are the shortest ones. 
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FIG. 3: Average stretch vs. network size, for scale-free net- 
works with different number of hubs, H ~ log" N , v — 
0, 1, 2, 3 and ER network ((fc) = 7) with v = 3. In all simula- 
tions H was scaled such that H(N = 10000) = 100. It can be 
seen that the performance of the scheme is much better for 
the scale-free network, with virtually no dependence in the 
network size and the number of hubs. 

achieve an efficient routing, one need not use too many 
hubs. 

In Fig. 2| we study the variation in the stretch when 
the parameters of the power-law degree distribution are 
changed. We compute the stretch for k min = 1,2,3 and 
for various values of 7. The behaviour of the stretch can 



be explained, as when we move to higher values of 7, the 
network becomes more sparse and tree-like. On the one 
hand recall that the scheme is optimal for tree structure, 
on the other hand when 7 increases we have less and 
less "real hubs", the network becomes similar to an ER 
network, on which the scheme performs worse, as shown 
above. For k m m = 1 the tree structure effect is much 
stronger, for k m i n — 3 many loops remain thus the effect 
of losing the hubs is stronger, for fc m *n = 2 neither of the 
effects is more significant. 

In summary, we have presented an efficient method 
for routing or searching in an environment where full 
knowledge of the network topology is not available. Our 
scheme changes the names of the nodes to more mean- 
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FIG. 4: Average stretch vs. 7 for a power-law network with 

kmin — 1, 2, 3. 



ingful names, that contain the path to the closest hub, 
where the hubs are chosen as nodes with highest degree. 
We have shown that this simple and intuitive method 
can be extremely useful in scale-free networks, such as 
the Internet. Using computer simulations, we have ex- 
plored the performance of our scheme with variations in 
the network and scheme parameters. 
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